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Abstract 

Face recognition stands out as a singular case of object recognition: although most faces are very much 
alike, people discriminate between many different faces with outstanding efficiency. Even though little is 
known about the mechanisms of face recognition, viewpoint dependence, a recurrent characteristic of many 
research on faces, could inform algorithms and representations. Poggio and Vetter's symmetry argument 
[10] predicts that learning only one view of a face may be sufficient for recognition, if this view allows the 
computation of a symmetric, "virtual," view. More specifically, as faces are roughly bilaterally symmetric 
objects, learning a side-view-which always has a symmetric view-should give rise to better generalization 
performances than learning the frontal view. It is also predicted that among all new views, a virtual view 
should be best recognized. We ran two psychophysical experiments to test these predictions. Stimuli were 
views of 3D models of laser-scanned faces. Only shape was available for recognition; all other face cues- 
texture, color, hair, etc. -were removed from the stimuli. The first experiment tested whether a particular 
view of a face was canonical. The second experiment tested which single views of a face give rise to best 
generalization performances. The results were compatible with the symmetry argument: face recognition 
from a single view is always better when the learned view allows the computation of a symmetric view. 
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1 Introduction 

In object recognition, it is often assumed that within- 
class discriminations are more difficult than between- 
class discriminations. For example, while people would 
experience no difficulty to segregate a car from a tree, 
it would be comparatively more complex to distinguish 
among brands of cars or species of trees. Researchers 
explain this discrepancy by the nature of the com- 
parisons involved: within-class judgments distinguish 
objects comparatively more similar than between-class 
judgments. Face recognition stands out as a notable ex- 
ception to the generality of this claim. Although most 
faces are very much alike-they share the same overall 
shape, textures and other features-people discriminate 
between many different faces with outstanding efficiency. 
Face recognition is a singular case of near perfect recogni- 
tion whose underlying mechanisms are of utmost interest 
to computer vision and psychophysics. 

Even though face recognition is well documented by 
psychophysical and neurophysiological studies, little is 
known about its algorithmic and representational char- 
acteristics. Converging evidence gathered across disci- 
plinary boundaries report a phenomenon which could in- 
form algorithmic and representational issues: Face recog- 
nition is viewpoint dependent. 

In single cell recordings studies, Perrett and his col- 
laborators discovered cells of the macaque superior tem- 
poral sulcus (STS) which are preferentially tuned to re- 
spond to specific views of a head [3, 6, 8, 7]. Most 
of the cells were viewer- centered responding unimodally 
to one view (either the frontal, the two profiles or the 
back views); few cells were tuned to other views of the 
360 degree range. Human psychophysics also reports a 
viewpoint preference compatible with view-based repre- 
sentations of faces. Among all views, the 3/4 view-the 
viewpoint between the full-face and the profile views-is 
identified faster and with greater accuracy [1, 14, 11, 4] 
(see [2, 12, 13] for other evidence of viewpoint depedent 
object recognition). The preference for a 3/4 view is 
naturally interpreted in light of Perrett 's findings as the 
view which elicits the highest total activity from the pro- 
file and full-face neurons; an activation higher than the 
response of the individual cells to their preferred view. 
In summary, neurophysiological and psychological data 
suggest two major constraints on representations and al- 
gorithms for face recognition: 1) faces could be repre- 
sented in memory with collections of few viewer-centered 
2D views and 2) viewpoint dependence could be sub- 
sumed by the tuning curves of viewpoint specific cortical 
cells. 

Does the side-view preference phenomenon reveal 
something substantial about face representation and 
recognition? Poggio and Vetter [10] showed that the 
recognition of a bilaterally symmetric object from a novel 
view could be achieved if only one nonsingular view of 
the object is known. If perception "assumes" symmetry, 



it could generate a symmetric "virtual" view from the 
only known view, or exploit equivalent information. A 
face is approximately bilaterally symmetric. Side-views 
of a face, before occlusion becomes too critical, are non- 
singular views from which a symmetric view can be gen- 
erated. The full-face view, however, is singular. In an 
RBF network [9], if units were centered on a side-view 
and its symmetric, together they could cover a larger 
range of the rotation of a face than a single unit cen- 
tered on the full-face view. The aims of this paper is 
to test the psychophysical reality of the symmetry argu- 
ment for face recognition. More precisely, we will test 
the following claims: 

• The side-view preference results from an interac- 
tion of the learned view of a face and recognition 
of other views. 

• Nonsingular views of a face-views from which a 
symmetric 2D view can be generated-give rise to 
better generalization performances than singular 
views. 

• Virtual views are generalized better than the other 
novel views of a face. 

2 Experiment 1 

The first experiment is a simple control. If a particu- 
lar view of a face is inherently more informative than 
any other view, it should always be preferred in recogni- 
tion. Side-views which conjugate part of the shape fea- 
tures and part of the profile could be canonical [5] in this 
sense. To test for canonical views, we trained subjects 
on all views of different faces before a testing stage on 
all views. If all views are experienced equivalently dur- 
ing learning, a canonical view should give rise to better 
accuracy and/or faster identification performances. 

2.1 Methods 

The psychophysics of face recognition must control the 
subject's familiarity with the stimuli as well as the type 
of information available for the task. Features such as 
hair color, hairstyle, texture or color of the skin, type and 
size of eyebrows are invariant under rotation in depth. 
With familiar faces, such shortcuts could lead to the type 
of viewpoint invariant face recognition discussed in [14]. 
To control familiarity and information, all faces were un- 
known to subjects prior to the experiment, and faces 
were presented as grey-level images of 3D shape mod- 
els. That is, obvious viewpoint invariant features were 
removed from the stimuli and we only tested shaped- 
based face recognition. 

2.1.1 Subjects 

11 subjects (age group 18-30) with normal or cor- 
rected vision, volonteered their time to participate to 
the experiment. 



Testing view 


-36 


-18 





18 


36 


Hit rate 


.96 


.91 


.91 


.91 


.86 


False alarm 


.14 


.18 


.05 


.05 


.14 


d' 


2.83 


2.26 


2.98 


2.98 


2.16 



Table 1: Hit rate, false alarm and d' for different views 
of the stimuli in Experiment 1. 

2.1.2 Stimuli 

Experiment 1 and 2 used the same set of stimuli. 
Stimuli were 256 grey-level views of 3D face models pre- 
sented on the monitor of a Silicon Graphics worksta- 
tion. There were 15 different face models; face data 
were laser-scanned three-dimensional coordinates of real 
faces. Each face was reconstructed by approximating the 
face data with a bicubic BSpline surface. Stimuli were 
views of each face at -36, -18, 0, 18, 36 degrees of rota- 
tion in depth (0 degree is the frontal view, see Figure 1). 
Faces were illuminated by a point light source located at 
the observer, shaded with a Gouraud shading model. 

2.1.3 Procedure 

The experiment was decomposed into ten blocks. A 
block consisted of a learning stage and a testing stage. In 
the learning stage, subjects had to learn a particular face 
(the target face). The target face rotated on the screen, 
once clockwise, once counterclockwise-or vice versa, de- 
pending on a random selection. The apparent rotation 
was produced by showing the five views of the target 
face in rapid succession (100 ms/view, for a total of 1 
sec/face). The learning stage was immediately followed 
by a testing stage. Test items were two views in the 
same orientation, presented one at a time-orientations 
were selected randomly. One view was a view of the 
target face, and the other, a view of the distractor face. 
For each view, subjects had to indicate whether or not it 
was a view of the target face by pressing the appropriate 
response-key on the computer keyboard. The experi- 
ment was completed after 10 blocks as just described. 
A different target face was associated with each block. 
Each of the 5 viewpoints was tested twice, each time 
with a different target. 

2.2 Results and Discussion 

To test for a viewpoint preference in recognition, we com- 
pared the mean percentage of correct recognition of the 
target in the 5 testing conditions. A one-way ANOVA 
revealed no significant effect of viewpoint (-F(4, 40) = 
.31, p= .87, ns.). Table 1 shows the hit rate, false alarm 
rate and d' for the identification of the stimuli in Exper- 
iment 1. 

Although subjects responded almost equivalently well 
to all views, it could still be argued that some views 
are correctly identified faster than others. A one-way 
ANOVA showed no effect of viewpoint {F(A, 40) = 



.78, p = .54, ns.) on reaction time for correct identifi- 
cation. Average reaction time across all views was 811 
ms. These results suggest that there is no viewpoint 
preference in face recognition when all views are experi- 
enced during learning. Thus, viewpoint dependent face 
recognition cannot simply be attributed to a recognition 
preference for certain views over others. 

3 Experiment 2 

The results of Experiment 1 showed that no view is 
canonical. Poggio and Vetter's symmetry argument pre- 
dicts that viewpoint preference could arise from an inter- 
action between the view learned (whether it is a singular 
or a nonsingular view) and the recognition stage. The 
aim of the second experiment is to test this prediction 
and to understand further the nature of the interaction. 
In a learning stage, distinct groups of subjects learned 
each a different view of the faces. All subjects were then 
tested on all views of the faces. We expected differences 
in performance between subjects who were in the singu- 
lar view group from those who were in the other groups. 

3.1 Methods 

3.1.1 Subjects 

30 subjects volunteered their time to participated to 
Experiment 2. They were randomly assigned to condi- 
tion. 

3.1.2 Stimuli 

Stimuli were identical to those of Experiment 1: 5 
views of 15 face models. 

3.1.3 Procedure 

Subjects were randomly assigned to one of five train- 
ing condition: the -36, -18, 0, 18, or 36 degree view. For 
example, subjects in group -36 only saw one view of a 
target face during learning: the -36 view (see Figure 1). 
The procedure of Experiment 2 was very similar to the 
one of Experiment 1. The experiment was segmented 
into 10 blocks. A block was composed of a learning and 
a testing stage. Here, however, subjects learned only 
one view of the target face. The view was presented for 
1 second, immediately followed by a testing stage. The 
testing stage also consisted of two successive views in 
the same orientation: one view of the testing face and 
a view of a distractor face. In 2 out of the 10 testing 
blocks, the testing view was the same as the learning 
view. The remaining 4 pairs of 2 blocks were each assign 
to a different testing view. A different target was asso- 
ciated with each block, and each possible viewpoint was 
tested twice, each time with a different target. During 
the experiment, subjects only saw one view of a partic- 
ular face in learning, and only one testing view of the 
same face. With this design, we could test how changing 
the learning view affected recognition performances. 



Figure 1: This figure illustrates the stimuli used in Experiment 1 and 2. The top pictures and the bottom 
pictures represent two different faces. From the left to the right, both sets of pictures show the -36, -18, 0, 
18, and 36 views used in the experiments. The views were computed from 3D face models reconstructed 
by approximating laser-scanned 3D coordinates of real faces with a BSpline surface. All textural, color, 
and hair cues were removed from the stimuli. A point light source located at the observer illuminated the 
Gouraud shaded surface of the faces. 



Learning view 


-36 


-18 





18 


36 


Hit rate 


.86 


.78 


.56 


.72 


.78 


False alarm 


.12 


.18 


.19 


.18 


.06 


d' 


2.26 


1.68 


1.03 


1.5 


2.26 



Table 2: Hit rate, false alarm and d' for different views 
of the stimuli in Experiment 2. 



in recognition performances between condition and all 
the other learning conditions (F(l, 1) = 14.55, p< .001) 
and this comparaison also interacted with the testing 
views (-F(l,4) = 4.28, p < .01). A second orthogo- 
nal test showed no significant difference between learn- 
ing conditions -18 and 18 contrasted to -36 and 36 
(F(l,l) = 1.06, p = .31). Figure 2 illustrates the in- 
teraction. 



3.2 Results and Discussion 

A two-way ANOVA was run to test for a dependence 
between the view learned and recognition performances 
as measured by percent correct recognition of the tar- 
get face. The results showed a main effect of learn- 
ing view (F(4, 25) = 4.17, p = .01), no main effect of 
testing view (-F(4, 16) = 1.87, p = .13, ns.) and a sig- 
nificant interaction of learning view and testing view 
(F(16, 100) = 2.03,p= .017). The absence of significant 
effect of testing view comes as no surprise. As shown 
in Experiment 1, no single view, by itself, stands out in 
recognition. Table 2 illustrates the overall recognition 
performances as a function of learning view. 

The data reveal a strong interaction between the 
learned view of a face and generalization to other views 
of the same face. To understand further this interac- 
tion, we contrasted recognition performance in learning 
condition (the frontal view) to all other learning con- 
ditions. The contrast revealed a significant difference 



In Figure 2, the hit rate to the different testing views 
as a function of learning condition reveals an interesting 
trend. The symmetry argument argument predicts that 
a U shaped generalization curve should describe the re- 
sponse profiles to the different testing views. The peaks 
of the curve, should be roughly located on the learned 
view, and on its symmetric view. Although further evi- 
dence are required to confirm the trend, a generalization 
curve of this form characterizes the group which learned 
the 36 view. 

An inverted U shape characterizes bad generalization 
performances- a sharp decrease of performance with in- 
creasing rotation in depth from the learned view. Such 
a response profile distinguishes subjects who learned the 
0-full-face-view. Since the full-face view is singular, a 
second view could not be computed from the frontal 
view. New views of the faces were not recognized with 
high accuracy. The intermediary group (the 18 group) 
displays a response profile in-between the two extremes. 

To summarize, these experiments on face recognition 




Testing view 

■ -36 

■ -18 

■ 

□ 18 

□ 36 



Learning View 

Figure 2: This figure illustrates the results of Experiment 2. The different learning conditions are grouped as 
a function of degrees of rotation from the full- face view. The histograms illustrate the hit rate to the different 
testing views. As predicted by the symmetry argument, the inverted U curve indictating poor generalization 
performances for the singular full-face view tends to turn into a U shaped generalization curve as the degree 
of rotation of the learned view increases. 



are compatible with the predictions of the symmetry 
argument. Experiment 1 showed that no single view 
was canonical. The second experiment showed that face 
recognition could be achieved from a non-singular view. 
These data suggest that a side-view should be preferred 
over a full-face view because a side-view allows better 
face encoding and recognition. 
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